AITopics | local window

Collaborating Authors

local window

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

C-3TO: Continuous 3D Trajectory Optimization on Neural Euclidean Signed Distance Fields

Gil, Guillermo, Cobano, Jose Antonio, Merino, Luis, Caballero, Fernando

arXiv.org Artificial IntelligenceSep-25-2025

Abstract-- This paper introduces a novel framework for continuous 3D trajectory optimization in cluttered environments, leveraging online neural Euclidean Signed Distance Fields (ESDFs). Unlike prior approaches that rely on discretized ESDF grids with interpolation, our method directly optimizes smooth trajectories represented by fifth-order polynomials over a continuous neural ESDF, ensuring precise gradient information throughout the entire trajectory. Experimental results demonstrate that C-3TO produces collision-aware and dynamically feasible trajectories. Moreover, its flexibility in defining local window sizes and optimization parameters enables straightforward adaptation to diverse user's needs without compromising performance. By combining continuous trajectory parameterization with a continuously updated neural ESDF, C-3TO establishes a robust and generalizable foundation for safe and efficient local replanning in aerial robotics. The source code is open source and can be found at: https://anonymous.4open.science/r/icra2026_ I. Introduction Aerial robots have become increasingly popular for a wide range of real-world applications due to their ability to perform hazardous tasks more efficiently and, most importantly, more safely than humans [1][2]. Fast trajectory replanning remains a critical area of research, particularly in dynamic and unstructured environments. Equally important is maintaining a continuously updated representation of the drone's surroundings, which is essential for generating continuous, safe, and smooth 3D local trajectories in real time. This paper presents a framework for planning a continuous local trajectory on an online, neurally-generated, distance field.

artificial intelligence, optimization problem, trajectory, (17 more...)

arXiv.org Artificial Intelligence

2509.20084

Country: Europe (0.46)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

Li, Yan, Zhang, Tianyi, Li, Zechuan, Han, Soyeon Caren

arXiv.org Artificial IntelligenceFeb-4-2025

Transformer-based Large Language Models (LLMs) struggle to process inputs exceeding their training context window, with performance degrading due to positional out-of-distribution (O.O.D.) that disrupt attention computations. Existing solutions, fine-tuning and training-free methods, are limited by computational inefficiency, attention logit outliers or loss of local positional information. To address this, we propose Greedy Attention Logit Interpolation (GALI), a training-free length extrapolation method that maximizes the utilization of pretrained positional intervals while avoiding attention logit outliers through attention logit interpolation. The result demonstrates that GALI consistently outperforms state-of-the-art training-free methods. Our findings reveal that LLMs interpret positional intervals unevenly within their training context window, suggesting that extrapolating within a smaller positional interval range yields superior results-even for short-context tasks. GALI represents a significant step toward resolving the positional O.O.D. challenge, enabling more reliable long-text understanding in LLMs. Our implementation of GALI, along with the experiments from our paper, is open-sourced at https://github.com/AcademyCityL/GALI.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.02659

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > China > Hunan Province > Changsha (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)

Add feedback

Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

Lin, Xihui, Zhang, Yunan, Ge, Suyu, Patra, Barun, Chaudhary, Vishrav, Peng, Hao, Song, Xia

arXiv.org Artificial IntelligenceAug-27-2024

Existing LLM training and inference frameworks struggle in boosting efficiency with sparsity while maintaining the integrity of context and model architecture. Inspired by the sharding concept in database and the fact that attention parallelizes over heads on accelerators, we propose Sparsely-Sharded (S2) Attention, an attention algorithm that allocates heterogeneous context partitions for different attention heads to divide and conquer. S2-Attention enforces each attention head to only attend to a partition of contexts following a strided sparsity pattern, while the full context is preserved as the union of all the shards. As attention heads are processed in separate thread blocks, the context reduction for each head can thus produce end-to-end speed-up and memory reduction. At inference, LLMs trained with S2-Attention can then take the KV cache reduction as free meals with guaranteed model quality preserve. In experiments, we show S2-Attentioncan provide as much as (1) 25.3X wall-clock attention speed-up over FlashAttention-2, resulting in 6X reduction in end-to-end training time and 10X inference latency, (2) on-par model training quality compared to default attention, (3)perfect needle retrieval accuracy over 32K context window. On top of the algorithm, we build DKernel, an LLM training and inference kernel library that allows users to customize sparsity patterns for their own models. We open-sourced DKerneland make it compatible with Megatron, Pytorch, and vLLM.

attention head, reduction, s2-attention, (15 more...)

arXiv.org Artificial Intelligence

2407.17678

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria > Vienna (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization

Li, Xiang, Govindan, Vivek, Paturi, Rohit, Srinivasan, Sundararajan

arXiv.org Artificial IntelligenceJun-26-2024

End-to-end neural diarization (EEND) models offer significant improvements over traditional embedding-based Speaker Diarization (SD) approaches but falls short on generalizing to long-form audio with large number of speakers. EEND-vector-clustering method mitigates this by combining local EEND with global clustering of speaker embeddings from local windows, but this requires an additional speaker embedding framework alongside the EEND module. In this paper, we propose a novel framework applying EEND both locally and globally for long-form audio without separate speaker embeddings. This approach achieves significant relative DER reduction of 13% and 10% over the conventional 1-pass EEND on Callhome American English and RT03-CTS datasets respectively and marginal improvements over EEND-vector-clustering without the need for additional speaker embeddings. Furthermore, we discuss the computational complexity of our proposed framework and explore strategies for reducing processing times.

diarization, eend, local window, (15 more...)

arXiv.org Artificial Intelligence

2406.18679

Country: North America > United States > New Jersey (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures

Rukhsar, Salim, Tiwari, Anil Kumar

arXiv.org Artificial IntelligenceMar-5-2024

We proposed an Attentive Recurrent Neural Network (ARNN), which recurrently applies attention layers along a sequence and has linear complexity with respect to the sequence length. The proposed model operates on multi-channel EEG signals rather than single channel signals and leverages parallel computation. In this cell, the attention layer is a computational unit that efficiently applies self-attention and cross-attention mechanisms to compute a recurrent function over a wide number of state vectors and input signals. Our architecture is inspired in part by the attention layer and long short-term memory (LSTM) cells, and it uses long-short style gates, but it scales this typical cell up by several orders to parallelize for multi-channel EEG signals. It inherits the advantages of attention layers and LSTM gate while avoiding their respective drawbacks. We evaluated the model effectiveness through extensive experiments with heterogeneous datasets, including the CHB-MIT and UPenn and Mayos Clinic, CHB-MIT datasets. The empirical findings suggest that the ARNN model outperforms baseline methods such as LSTM, Vision Transformer (ViT), Compact Convolution Transformer (CCT), and R-Transformer (RT), showcasing superior performance and faster processing capabilities across a wide range of tasks. The code has been made publicly accessible at \url{https://github.com/Salim-Lysiun/ARNN}.

architecture, dependency, sequence, (14 more...)

arXiv.org Artificial Intelligence

2403.03276

Country:

Oceania > Australia (0.04)
North America > United States > Massachusetts (0.04)
Asia > India (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.52)
Health & Medicine > Therapeutic Area > Genetic Disease (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference

Ren, Siyu, Zhu, Kenny Q.

arXiv.org Artificial IntelligenceFeb-9-2024

Despite the recent success associated with Large Language Models~(LLMs), they are notably cost-prohibitive to deploy in resource-constrained environments due to their excessive memory and computational demands. In addition to model parameters, the key-value cache is also stored in GPU memory, growing linearly with batch size and sequence length. As a remedy, recent works have proposed various eviction policies for maintaining the overhead of key-value cache under a given budget. This paper embarks on the efficacy of existing eviction policies in terms of \textit{importance score calculation} and \textit{eviction scope construction}. We identify the deficiency of prior policies in these two aspects and introduce RoCo, a \underline{r}\underline{o}bust \underline{c}ache \underline{o}mission policy based on temporal attention scores and robustness measures. Extensive experimentation spanning prefilling and auto-regressive decoding stages validates the superiority of RoCo. Finally, we release EasyKV, a versatile software package dedicated to user-friendly key-value constrained generative inference. Code available at \url{https://github.com/DRSY/EasyKV}.

arxiv preprint arxiv, broadway, eviction policy, (13 more...)

arXiv.org Artificial Intelligence

2402.06262

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Oklahoma (0.04)
North America > United States > Texas (0.04)
(8 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)

Add feedback

Contrastive Label Disambiguation for Self-Supervised Terrain Traversability Learning in Off-Road Environments

Xue, Hanzhang, Hu, Xiaochang, Xie, Rui, Fu, Hao, Xiao, Liang, Nie, Yiming, Dai, Bin

arXiv.org Artificial IntelligenceJul-6-2023

Discriminating the traversability of terrains is a crucial task for autonomous driving in off-road environments. However, it is challenging due to the diverse, ambiguous, and platform-specific nature of off-road traversability. In this paper, we propose a novel self-supervised terrain traversability learning framework, utilizing a contrastive label disambiguation mechanism. Firstly, weakly labeled training samples with pseudo labels are automatically generated by projecting actual driving experiences onto the terrain models constructed in real time. Subsequently, a prototype-based contrastive representation learning method is designed to learn distinguishable embeddings, facilitating the self-supervised updating of those pseudo labels. As the iterative interaction between representation learning and pseudo label updating, the ambiguities in those pseudo labels are gradually eliminated, enabling the learning of platform-specific and task-specific traversability without any human-provided annotations. Experimental results on the RELLIS-3D dataset and our Gobi Desert driving dataset demonstrate the effectiveness of the proposed method.

artificial intelligence, machine learning, traversability, (17 more...)

arXiv.org Artificial Intelligence

2307.02871

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window

Koo, Jinkyu, Yang, John, An, Le, Sergio, Gwenaelle Cunha, Park, Su Inn

arXiv.org Artificial IntelligenceJun-23-2023

Transformer models have shown great potential in computer vision, following their success in language tasks. Swin Transformer is one of them that outperforms convolution-based architectures in terms of accuracy, while improving efficiency when compared to Vision Transformer (ViT) and its variants, which have quadratic complexity with respect to the input size. Swin Transformer features shifting windows that allows cross-window connection while limiting self-attention computation to non-overlapping local windows. However, shifting windows introduces memory copy operations, which account for a significant portion of its runtime. To mitigate this issue, we propose Swin-Free in which we apply size-varying windows across stages, instead of shifting windows, to achieve cross-connection among local windows. With this simple design change, Swin-Free runs faster than the Swin Transformer at inference with better accuracy. Furthermore, we also propose a few of Swin-Free variants that are faster than their Swin Transformer counterparts.

artificial intelligence, machine learning, swin-free, (18 more...)

arXiv.org Artificial Intelligence

2306.13776

Country: North America > United States > California > Santa Clara County > Santa Clara (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BOAT: Bilateral Local Attention Vision Transformer

Yu, Tan, Zhao, Gangming, Li, Ping, Yu, Yizhou

arXiv.org Artificial IntelligenceOct-19-2022

Vision Transformers achieved outstanding performance in many computer vision tasks. Early Vision Transformers such as ViT and DeiT adopt global self-attention, which is computationally expensive when the number of patches is large. To improve efficiency, recent Vision Transformers adopt local self-attention mechanisms, where self-attention is computed within local windows. Despite the fact that window-based local self-attention significantly boosts efficiency, it fails to capture the relationships between distant but similar patches in the image plane. To overcome this limitation of image-space local attention, in this paper, we further exploit the locality of patches in the feature space. We group the patches into multiple clusters using their features, and self-attention is computed within every cluster. Such feature-space local attention effectively captures the connections between patches across different local windows but still relevant. We propose a Bilateral lOcal Attention vision Transformer (BOAT), which integrates feature-space local attention with image-space local attention. We further integrate BOAT with both Swin and CSWin models, and extensive experiments on several benchmark datasets demonstrate that our BOAT-CSWin model clearly and consistently outperforms existing state-of-the-art CNN models and vision Transformers.

artificial intelligence, local attention, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2201.13027

Country:

North America > Canada > Quebec > Montreal (0.05)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(12 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Filters

Collaborating Authors

local window

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

C-3TO: Continuous 3D Trajectory Optimization on Neural Euclidean Signed Distance Fields

7e487c72fce6e45879a78ee0872d991d-Paper-Conference.pdf

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

Efficient LLM Training and Serving with Heterogeneous Context Sharding among Attention Heads

Speakers Unembedded: Embedding-free Approach to Long-form Neural Diarization

ARNN: Attentive Recurrent Neural Network for Multi-channel EEG Signals to Identify Epileptic Seizures

On the Efficacy of Eviction Policy for Key-Value Constrained Generative Language Model Inference

Contrastive Label Disambiguation for Self-Supervised Terrain Traversability Learning in Off-Road Environments

Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window

BOAT: Bilateral Local Attention Vision Transformer